38 research outputs found

    Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database

    Get PDF
    We present a unified framework for body and hand tracking, the output of which can be used for understanding simultaneously performed body-and-hand gestures. The framework uses a stereo camera to collect 3D images, and tracks body and hand together, combining various existing techniques to make tracking tasks efficient. In addition, we introduce a multi-signal gesture database: the NATOPS aircraft handling signals. Unlike previous gesture databases, this data requires knowledge about both body and hand in order to distinguish gestures. It is also focused on a clearly defined gesture vocabulary from a real-world scenario that has been refined over many years. The database includes 24 body-and-hand gestures, and provides both gesture video clips and the body and hand features we extracted

    Multi-signal gesture recognition using temporal smoothing hidden conditional random fields

    Get PDF
    We present a new approach to multi-signal gesture recognition that attends to simultaneous body and hand movements. The system examines temporal sequences of dual-channel input signals obtained via statistical inference that indicate 3D body pose and hand pose. Learning gesture patterns from these signals can be quite challenging due to the existence of long-range temporal-dependencies and low signal-to-noise ratio (SNR). We incorporate a Gaussian temporal-smoothing kernel into the inference framework, capturing long-range temporal-dependencies and increasing the SNR efficiently. An extensive set of experiments was performed, allowing us to (1) show that combining body and hand signals significantly improves the recognition accuracy; (2) report on which features of body and hands are most informative; and (3) show that using a Gaussian temporal-smoothing significantly improves gesture recognition accuracy.United States. Office of Naval Research (Science of Autonomy program, Contract #N000140910625)National Science Foundation (U.S.) (NSF grant #IIS-1018055

    Activity maps for location-aware computing

    Get PDF
    The Problem: Location-based context is important for many applications. Previous systems offered only coarse room-level features or used manually specified room regions to determine fine-scale features. We propose a location context mechanism based on activity maps, which define regions of similar context based on observations of 3-D patterns of location and motion in an environment. We describe an algorithm for obtaining activity maps in real time using the spatio-temporal clustering of visual tracking data. Motivation: In many cases, fine grain location based information is preferred. One example would be to control lights and air conditioning, e.g. the desk lamp might light up and the air conditioning starts whenever a user is sitting at his desk. In addition the phone might become activated and the computer screen get invoked from stand-by mode. Similarly in a small group meeting the system could know where and how many people are in the room and could make appropriate settings for lights, air conditioning, and computer tools. For each of these tasks, location context information is important [3]. Simply considering the instantaneous 3-D location of users is useful, but alone is insufficient as context information. Applications have to generalize context information from previous experience, and an application writer would like to access categorical context information, such as what activity

    Predicting Online Media Effectiveness Based on Smile Responses Gathered Over the Internet

    Get PDF
    We present an automated method for classifying “liking” and “desire to view again” based on over 1,500 facial responses to media collected over the Internet. This is a very challenging pattern recognition problem that involves robust detection of smile intensities in uncontrolled settings and classification of naturalistic and spontaneous temporal data with large individual differences. We examine the manifold of responses and analyze the false positives and false negatives that result from classification. The results demonstrate the possibility for an ecologically valid, unobtrusive, evaluation of commercial “liking” and “desire to view again”, strong predictors of marketing success, based only on facial responses. The area under the curve for the best “liking” and “desire to view again” classifiers was 0.8 and 0.78 respectively when using a challenging leave-one-commercial-out testing regime. The technique could be employed in personalizing video ads that are presented to people whilst they view programming over the Internet or in copy testing of ads to unobtrusively quantify effectiveness.MIT Media Lab Consortiu

    Avoiding the "streetlight effect": tracking by exploring likelihood modes

    Full text link
    Classic methods for Bayesian inference effectively constrain search to lie within regions of significant probability of the temporal prior. This is efficient with an accurate dynamics model, but otherwise is prone to ignore significant peaks in the true posterior. A more accurate posterior estimate can be obtained by explicitly finding modes of the likelihood function and combining them with a weak temporal prior. In our approach modes are found using efficient example-based matching followed by local refinement to find peaks and estimate peak bandwidth. By reweighting these peaks according to the temporal prior we obtain an estimate of the full posterior model. We show comparative results on real and synthetic images in a high degree of freedom articulated tracking task. 1

    Le mouvement projectif : théorie et applications pour l'autocalibrage et la segmentation du mouvement

    No full text
    Stereo vision appears in many applications as an easy way to obtain 3D data from images. Stereo approaches usually rely on Euclidean models and require a full calibration of the stereo rig, implying that the intrinsic parameters of the cameras are known as well as the relative position and orientation of the cameras. However a full and accurate calibration usually requires that a human operator helps. In cases when an operator cannot be involved, the use of weakly calibrated systems appears as a good alternative. A weak calibration is easy to obtain but then the difficulty is that 3D data are obtain in a projective space (not in a Euclidean one). This document describes the use of weakly calibrated systems performing motions in an a priori unknown scene. It shows how the motion of the system can be used to retrieve the metric structure of the scene and detect moving objects. The projective space is used here to represent the visual information associated with the system. In particular, we studied the 3D projective transformations -also called 3D homographies- that map the projective reconstructions of a same scene. We introduce the problem of estimating such 3D homographies and show how theses transformations can be used in applications such as autocalibration and motion segmentation.La vision stéréoscopique apparaît dans de nombreuses applications comme le moyen le plus évident pour obtenir des informations tridimensionnelles à partir d'images. Les approches employées reposent généralement sur des modèles euclidiens et nécessitent un étalonnage fort des systèmes stéréoscopiques utilisés, ce qui implique que les paramètres internes des caméras ainsi que la position relative entre les caméras doivent être connues. Or un étalonnage fort et précis nécessite généralement une intervention humaine. Cependant une aide extérieure n'est pas toujours possible et l'utilisation de systèmes faiblement étalonnés (systèmes dont seule la géométrie épipolaire est connue) apparaît alors comme une alternative. Un étalonnage faible est très facile à obtenir mais la difficulté est qu'alors les informations tridimensionnelles obtenues sont projectives et non plus euclidiennes. Ce document s'inscrit dans une approche basée sur un étalonnage faible et s'intéresse à l'étude d'un système stéréoscopique faiblement étalonné évoluant dans un environnement a priori inconnu. Il montre comment, en pratique, on peut tirer partie du mouvement d'un système stéréoscopique pour remonter à la structure métrique de la scène (par auto-étalonnage) et détecter des objets en mouvement. L'espace projectif est utilisé ici pour représenter l'information visuelle issue du système. En particulier, on étudie les transformations projectives 3D -appelées également homographies 3D- qui relient les reconstructions projectives d'une scène rigide. On s'intéresse au problème d'estimation de ces homographies 3D et on montre comment celles-ci entrent en jeu dans des applications telles que l'auto-étalonnage ou la segmentation du mouvemen

    Le mouvement projectif : théorie et applications pour l'autocalibrage et la segmentation du mouvement

    No full text
    Stereo vision appears in many applications as an easy way to obtain 3D data from images. Stereo approaches usually rely on Euclidean models and require a full calibration of the stereo rig, implying that the intrinsic parameters of the cameras are known as well as the relative position and orientation of the cameras. However a full and accurate calibration usually requires that a human operator helps. In cases when an operator cannot be involved, the use of weakly calibrated systems appears as a good alternative. A weak calibration is easy to obtain but then the difficulty is that 3D data are obtain in a projective space (not in a Euclidean one). This document describes the use of weakly calibrated systems performing motions in an a priori unknown scene. It shows how the motion of the system can be used to retrieve the metric structure of the scene and detect moving objects. The projective space is used here to represent the visual information associated with the system. In particular, we studied the 3D projective transformations -also called 3D homographies- that map the projective reconstructions of a same scene. We introduce the problem of estimating such 3D homographies and show how theses transformations can be used in applications such as autocalibration and motion segmentation.La vision stéréoscopique apparaît dans de nombreuses applications comme le moyen le plus évident pour obtenir des informations tridimensionnelles à partir d'images. Les approches employées reposent généralement sur des modèles euclidiens et nécessitent un étalonnage fort des systèmes stéréoscopiques utilisés, ce qui implique que les paramètres internes des caméras ainsi que la position relative entre les caméras doivent être connues. Or un étalonnage fort et précis nécessite généralement une intervention humaine. Cependant une aide extérieure n'est pas toujours possible et l'utilisation de systèmes faiblement étalonnés (systèmes dont seule la géométrie épipolaire est connue) apparaît alors comme une alternative. Un étalonnage faible est très facile à obtenir mais la difficulté est qu'alors les informations tridimensionnelles obtenues sont projectives et non plus euclidiennes. Ce document s'inscrit dans une approche basée sur un étalonnage faible et s'intéresse à l'étude d'un système stéréoscopique faiblement étalonné évoluant dans un environnement a priori inconnu. Il montre comment, en pratique, on peut tirer partie du mouvement d'un système stéréoscopique pour remonter à la structure métrique de la scène (par auto-étalonnage) et détecter des objets en mouvement. L'espace projectif est utilisé ici pour représenter l'information visuelle issue du système. En particulier, on étudie les transformations projectives 3D -appelées également homographies 3D- qui relient les reconstructions projectives d'une scène rigide. On s'intéresse au problème d'estimation de ces homographies 3D et on montre comment celles-ci entrent en jeu dans des applications telles que l'auto-étalonnage ou la segmentation du mouvemen

    LE MOUVEMENT PROJECTIF. THEORIE ET APPLICATIONS POUR L'AUTOCALIBRAGE ET LA SEGMENTATION DU MOUVEMENT

    No full text
    GRENOBLE1-BU Sciences (384212103) / SudocGRENOBLE-MI2S (384212302) / SudocSudocFranceF

    Motion-egomotion discrimination and motion segmentation from image-pair streams

    Get PDF
    Given a sequence of image pairs we describe a method that segments the observed scene into static and moving objects while it rejects badly matched points. We show that, using a moving stereo rig, the detection of motion can be solved in a projective framework and therefore requires no camera calibration. Moreover the method allows for articulated objects. First we establish the projective framework enabling us to characterize rigid motion in projective space. This characterization is used in conjunction with a robust estimation technique to determine egomotion. Second we describe a method based on data classification which further considers the non-static scene points and groups them into several moving objects. Third we introduce a stereo-tracking algorithm that provides the point-to-point correspondences needed by the algorithms. Finally we show some experiments involving a moving stereo head observing both static and moving objects. c â—‹ 2000 Academic Press 1
    corecore